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Abstract We study source localization from high dimensional M/EEG data by extend¬ 
ing a multiscale method based on Entropic inference devised to increase the spatial reso¬ 
lution of inverse problems . This method is used to construct informative prior distribu¬ 
tions in a manner inspired in the context of fMRI (Amaral et al 2004 UU) . We construct 
a set of renormalized lattices that approximate the cortex region where the source ac¬ 
tivity is located and address the related problem of defining the relevant variables in a 
coarser scale representation of the cortex. The priors can be used in conjunction with 
other Bayesian methods such as the Variational Bayes method (VB, Sato et al 2004 112211 ). 

The central point of the algorithm is that it uses a posterior obtained at a coarse scale 
to induce a prior at the next finer scale stage of the problem. We present results which 
suggest, on simulated data, that this way of including prior information is a useful aid 
for the source location problem. This is judged by the rate and magnitude of errors in 
source localization. Better convergence times are also achieved. We also present results 
on public data collected during a face recognition task. 

Keywords: Maximum Entropy, priors, cortical source localization, EEG, MEG,Bayesian algo¬ 
rithms, inverse problems 


1 


1 Introduction 


Designing improved methods of inference by updating probabilities as new information is 
aquired, depends partly on the development of ideas of how to incorporate prior information. 
In this paper we concentrate on the construction of prior distributions for the localization 
of cortical dipole sources in EEG or MEG high resolution experiments. Several groups have 
presented very encouraging results with respect to source localization (for a review see II211I ). 
After data aquisition, the source localization analysis problem is divided into three essentially 
separate problems: first, construction of the prior; second, of the likelihood or model and 
third, of the algorithm to extract the information from the resulting posterior. Several such 
information extraction algorithms have been systematically analyzed by Wipf and Nagarajan 
II24II in a useful unification that simplifies the literature, since different Bayesian approaches 
differ only in the available information. Sato et al | |22[ | have tackled the construction of 
empirical priors. Our approach, which can also focuses on an empirical prior is based on 
a previously introduced multiscale-prior method Q40 which formalized ideas applied in the 
context of fMRI by |JTD, Q20. It relies on the maximum entropy (ME) method to transfer 
information gained at a coarse scale to start inference at a finner scale. 

Advances by Dale and Sereno Q5J permitted the inclusion of information from structural 
MRI to construct a Green’s function that is specific to the subject under study. This can 
be used in a Bayesian approach as such additional information can be incorporated in the 
construction of the model and therefore of the likelihood. Once a posterior distribution 
is constructed, it still remains to choose an appropriate numerical algorithm in order to 
extract relevant information in the form of expected values or MAP estimates. Sato et al 


use the Automatic Relevance Determination approach of Neal II15II through the Variational 
Bayes (VB) approximation, (see also Q3J) while Nummemaa et al. [|16( | analyzed essentially 
the same mathematical structure using both VB and Monte Carlo methods. A schematic 


description of the VB method we employ is given in section 2.1 


In sections 2.2 and 2.3 we describe the central part of this work, the method of back¬ 


ward renormalization priors based on entropic inference. We show how to use information 
obtained at a coarse renormalized scale to improve the starting point on a finner scale. This 
can be iterated up from a very coarse description of the system to the finnest scale. The pre¬ 
sentation is very general and the particular form of the set of renormalized lattices depends 
on the application at hand. 

Validation of the method appears in section [3] using simulated data sets. After calculating 
the forward model with a set of known dipole sources, we assess the results comparing 
the performance of the source localization of 2 dipoles at random positions and intensities 
using both the original VB method and the multscale approach. This is done comparing the 
distance between the localized dipole (or first N strongest dipoles) and the real one (used to 
generate the simulated data), and also the sensitivity of both algorithms (ROC curves) when 
the noise is increased. Finally, we applied our method to the problem of source localization 
of a face/mask paradigm in the data available at http://www.fil.ion.ucl.ac.uk/spm/data/ 
(SPM - Wellcome Trust Centre for Neuroimaging). A discussion and conclusions appear in 
the final section. 
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2 Methods 


Since our contribution lies in the proposed method, this is the central part of the paper. But 
first we describe the inverse problem in section 2.1 In this part we don’t make any advances 
and follow the work of Sato et al. ||22|| ). Then, in 2.2 we discuss the central part of our 


contribution, the multiscale approach to the prior distribution. 


2.1 The forward and inverse problems: from cortical sources to M/EEG data and 
back 


We follow Sato | |22l |l using a linear model that has been discussed by II17IL IT8T1 Ill3ll . ||14ll 
111811 (see also 112311 1. We denote by V the electric potential at the scalp surface, by J the 
density of dipole sources that give rise to those potentials and G the Green function that 
describes the electromagnetic medium. It represents the macroscopic physiological and geo¬ 
metrical details of the head known in this area as the Lead Field. 

Measurements are subject to noise E, that has been supposed to be gaussian and indepen¬ 
dent of space and time, so that for data V and dipole sources density J the model is 


V = GJ + 


(1) 


For M sensors, T the length of the time series of measurements, N the number of sites of the 
lattice where the dipoles live, the dimensions of the matrices V,J,G and E, are respectively 
MxT,NxT,NxM and M x T. Bayes theorem leads to 


P(J|VJ) = 


pMnnvm 

p(y\i ) 


( 2 ) 


The method that we follow is essentially the variational Bayes used by Sato et al. with a 
twist. In their method, the prior distribution of the dipole density P(J d ) = P({J d (r d (i))}) is 
parametrized by a set of parameters a d = {a d (i)} where A d is the lattice of positions r d (i) 
used to represent the cortex where the dipoles live. a d (i) is the inverse of the variance of 
the zero mean gaussian also called the precision, used as prior for the dipole density vector 
amplitude J d (r d (i)) at each site i e A d . The direction of each J d (r d (i)) vector is fixed and 
normal to the surface representing the cortex at that point. Incomplete knowledge of a d 
prompts the use of hyperpriors for each individual a d (i). It is reasonable, as they do, to use 
a set of Gamma hyperprior distributions for the set of a d (i). Integrating the gaussian priors 
over the Gamma distributed a d (i) leads to t-student distributions. So that at each site, the 
distributions of the density belong to the family given by 


P(J|d,y) 
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(3) 


Call collectively the set of hyperparameters of the Gammas G d . The inclusion of the data 
read from the electrodes, V , by the Variational Bayes method, leads to a dynamics of the 
hyperparameters of the gammas, which converges to a final value G d . Their method can be 
succinctly described by the mapping 


o° d 


q q Variational Bayes t 

'' —> o. 
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(4) 
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Their starting point is that every 0 d (r d (i)) e 0 d is the same, i.e a prior of the dipole density 
spatially invariant over the cortex. 

Our contribution consists of considering a different choice of the prior distributions. For 
this we use the theory of backward O renormalization priors, which are inspired in the 
multigrid prior approach used by QXO to study fMRI. By using entropic inference, we can use 
the posterior in a coarse scale to generate an informed prior in the next hnner scale. 

2.2 The Multiscale problem 


The relevant variables are the dipoles densities J and the electric potential (or magnetic 
fluxes) to be measured at the electrodes. The space where the variables live is a representa¬ 
tion of the cortex obtained from structural magnetic resonance data using Free Surfer image 
analysis suite, which is documented and freely available for download online 
(http://surfer.nmr.mgh.harvard.edu/ - [ED). This representation can be done at different 

levels of resolution, so that we define a set of renormalized lattices {A d } d=0 D . Each lattice 

is composed by sites r d (i) e A d with i = 1,_| A d |. The particular form of how a coarser or 

renormalized lattice A d _ 1 is obtained from the previous A d depends on the particular type of 
problem. For the problem at hand of EEG dipole sources, the first lattice A 0 was generated by 
introducing an icosahedron in the spherical surface (Figure [T]) obtained by inflating a brain 
hemisphere. Deflating the spheres results in a first lattice with 40 faces. Next, to generate the 
finer scales lattices A l3 ..., A D , each trianguar face of the inflated brain was into 4 triangles. 
More details about the position of dipoles in each face are given in section 3.1[ 

At each scale d there is a set of dipole density amplitudes, collectively denoted J d = 
{J d (r d (i))}. We denote the integration measure over the set of |A d | variables by dJ d . 

Consider just two consecutive scales, a coarse one d — 1 and the higher resolution d. 
Suppose we have solved the case at the d — 1 level and now we want to solve the problem at 
scale d. This will give a map to be iterated from the coarsest to the finest scale. The method 
to be used follows the simpler case of discrete variables presented in []4D is the maximum 
entropy (ME), and the aim is to obtain a distribution P(J d ,J d _ 1 ,V ) from a prior distribution 
Q(J d , J d _ l3 V). The method of choice is the ME because (i) when constraints are imposed 
and the prior already satisfies the constraints ME will result in a posterior equal to the prior 
and (ii) when the results of a measurement are considered as constraints, ME gives the same 
results as Bayes |9D • Bayesian usual update can be thought of as a special case of Maximum 
Entropy where the constraints arise from the data. 

To show (i) we perform a simple Maximum Entropy exercise: Consider a variable X that 
takes values x and that Q(x) represents our prior state of knowledge. New information 
is obtained, e.g that the expected value < /(x) > is known to have a particular value: 
< /(x) >= E. The update from Q(x) to P(x) is done by maximizing 


S[P||Q] = - 


P(x) 

PO)log—r^-dx + A 

QO) 


/(x)dx — E + A C 


P(x)dx — 1 


(5) 


The result, after satisfying normalization is the Boltzmann-Gibbs probability density P(x) = 
Q(x)exp^x)^ with A to be chosen to satisfy < /(x) > P = E. What is the value of A if < 
/(x) >q= E, if the prior already satisfied the constraint? It is simple to see that A = 0 and 
Z(A) = 1, so the reuse of old data in the from of constraints, again and again doesn’t change 
the density Q(x) that already satisfies the constraint. While this sounds trivial, it will be 
useful since data in Bayes updates can be written as constraints for Maximum Entropy. 

To prove condition (ii) we follow . Consider the problem where a distribution P((9) 
has to be obtained, first, from the knowledge that a measurement of X has yielded a datum 
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x'; second, that prior to the inclusion of such information our knowledge of 0 is codified by 
a distribution Q(0) and third, that the relation between X and 0 is codified by a likelihood 
Q(x|0). Thus we have to consider P(x,0) subject to constraints J d0P(x,0) = P(x) = 
5(x — x 7 ). This is not a single constraint, i.e. instead of a single Lagrange multiplier, we have 
to consider a function A(x) and maximize 


S[P | |Q] 


r P(x, 0) 

P(x, 0) log ———dxdO + 


A(x) 


Q(x,0) 

d0P(x, 0) — 5(x — x') ] dx + A 0 


P(x)dx — 1 


( 6 ) 


J J J 

After obtaining the joint density P(x, 0) by maximizing the entropy, we can calculate the 
desired marginal P(0) = J dxP(x, 0). The result is 


P(0) = Q(0|xO, (7) 

where Q(0|x / ) = is the Bayes posterior Q(0|x / ) given by Bayes theorem, which 

just follows from the rules of probability. This proves that maximum entropy as an inference 
engine justifies the usual Bayes procedure when the constraint is a datum such as knowing 
that a measurement of X turned out to give a value x'. Maximum entropy is used to show 
that Bayes theorem should be used in the inference process. 

Going back to the EEG problem we consider that the relevant space is formed by the 
dipole variables at the two scales and the electrode potentials. Thus we seek the maximiza¬ 
tion of 


p(j j vd 

S[P||Q] = - P{J d ,J d _ l , V )log d ’ d ~ U dJ d dJ d _ x dV (8) 

J ^ J d> J d-i> v ) 

to obtain P(J d ,J d _ 1 ,V), which in addition to normalization, is subject to 

• (A) The marginal P(V) = 5(V — v'\ the measured data is V. 

• (B) QC/^) is given, e.g. by Jth -1 i|0 d “ 1 (r d_1 (i))) for some parametric family 

/(J|0). 


• (C) Knowledge about the process of renormalization is coded by Q(J d | Jj.!) (see section 

mi) 

• Given J d , knowledge of J d _ 1 is irrelevant for V: Q(V|J d J d _ 1 ) = Q(V|J d ) 
Marginalization and the product rule of probability give 


QM = 


Q(J d -i)Q(J d |J d -i)dJ d _i. 


J 


(9) 


which can be calculated from (B) and (C). Solving the maximization problem and taking the 
marginal of the maximum entropy distribution we obtain 


= QVdMvVd) 

Q(v') 


( 10 ) 
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is given by what we would have expected, Bayes theorem, with the extra important ingre¬ 
dient brought in by equation [9j that the prior at this new scale is obtained by whatever 
information we have on J d _ l3 i.e. Q(J d _ 1 ) and the renormalization procedure Q(J d |J d _ 1 ). 

If we wish to restrict the distributions to products of some parametric form , e.g. t- 
students, we have to use Q(J d |d°, y°), from equation [ 9 ] given by 


Q(Jd\a d ,r d ) 


QUd-iK 


-vT f d- 


1 )QWd- iW d -i 


(ID 


This is the backward renormalization step. The idea is to determine which distribution of J d 
in the parametric space of the t-student distributions is closest to the integral on the right 
side of equation [Tl] So a posterior in the coarsest scale d — 1 induces a prior in the d scale. 
We can start at the lowest resolution with the same (ai, y l Q ) initial values at every site of the 
coarsest lattice A 0 , obtain via variational Bayes the parameters for the posterior of the J 0 
from equation[lO]and proceed for the next scales as represented by the map: 


. f f N BackRenorm 
(“d-t’/d-1) > 



( 12 ) 


Beginning with a uniform prior at the coarsest scale, that is a set of parameters (&?, y°) 
uniform over the lattice A 0 , we iterate the mapping 



BackRenorm . n n.VB.f f. BackRenorm . n n . VB 

(<*d>r d )->(<**, r d ) —* (« d+ 1 ,r d+ i)--- 


(13) 


to finally obtain a posterior at the finest scales described by (d^,^), the desired answer to 
the inference problem. 

Finally it must be emphasized that the marginal of the maximum entropy distribution 
P(J d _ 1 ) = J P(J d , J d -\, V)dJ d dV is given by Bayes theorem at the coarser level, P(J d _ a ) = 
Q(J d _ 1 )Q(V / |i d _ 1 /Q(V / ), showing that the new maximization of the entropy didn’t alter the 
result obtained by the previous step. Reusing the same data within the realm of maximum 
entropy is not the same as naively reusing the data using just Bayes theorem updating. For 
ME it is harmless, as it imposes a constraint already satisfied, while for Bayes it represents 
the belief that the data were independently re-obtained, leading to a unwarranted decrease 
of uncertainty. 


2.3 Backward renormalization priors 

We now investigate the backward renormalization step given by equation [lTJ Renormaliza¬ 
tion is seldom a tidy business, and the fact that the dipoles are vectors and that the direction 
of their sum is not necessarily the same as the perpendicular to the surface of the cortex, 
does complicate things even further. The cortex is represented by triangular faces that are 
not in the same plane and thus the renormalized face is not simply related to the finner scale 
faces. Approximations are needed to advance and suggest a specific form for the mapping 
in display [12] Numerically we have investigated this suggestion and found some variations 
on the theme that lead to good results. We can analyze the backward renormalization in the 
following simplified context. Call A = |A d _ 1 |/|A d | the ratio of degrees of freedom of a lattice 
at a stage d — 1 of renormalization, to the next, finner stage d. In this work A = 1/4. When 
going from one lattice to a coarser one density variables are approximately renormalized 
according to a scaled block average: 


iU ) 


(14) 
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where £(j) means that the sum is over the set of degrees of freedom at r d (i) that are blocked 
to form the coarser degree of freedom at position r d_1 (j). For independent J d (i) probability 
theory leads to the convolution 


Q-d- l(Jd-l(J)) 


n [dJ d (0Q(J d (0)] 5 


J i(j) 


i(j) 


Using the characteristic functions, <f>(k) = (Q(J)), the Fourier transforms of the distribu¬ 

tions: 

*d-i (fc)= [* d (Afc)] lA (15) 

So the prior distribution of the dipole at the finner scale position i can be chosen by inverting 
a Fourier transform: 


(16) 

For distributions stable under additions, this entails a simple backward renormalization of 
the distribution parameters. 

From all this development, the main information we obtained is that the expected value of 
the precision of the gaussian prior of the dipole density should decrease at the new lattice in 
comparison with that of the posterior at the previous coarser lattice. For the more intricate 
renormalization we have to consider, a further improvement obtained numerically in the 
simulations is that not only the variance of the prior should be larger, but that at a given 
lattice the inferred position of a dipole might be a little off and seem to be at a neighboring 
site. Thus we introduced what we call contamination, by adding variance from the nearest 
and next nearest sites: 


Q(J d (£)) = J?J^ [* d _! 


1 


a 


mo 

d +1 



(17) 


bringing in information from j, the parent site of i as well as the nearest neighbors (j„) and 
the next nearest neighbors (j nn ) of j. This prevents early commitment of the position of a 
dipole. In the average, the precision scales as the backward renormalization step suggests. 
For all the sites i in the finner lattice A d that give rise to a given renormalization block at 
r d_1 0'), the prior of J(r d (£)) will have renormalized parameters inherited from the block 
variable distribution of J(r d_1 (j)). 
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Figure 1: Cortex, left hemisphere (Left) original as obtained from a the structural MR image, 
(center) inflated, (right) spherical. Gray scale code the curvature in the original image. 



Figure 2: Representation of the cortex at different scales. Gray scale is used to represent 
finer scale balanced surfaces derived from the first representation, the icosahedral. The 
dipole density vectors are constrained to be at a site at the center and perpendicular to each 
triangular face. 


3 Results and discussions 


Depending on the choice of the initial and final renormalized lattice several methods can 
be defined. We denote them by BRVB fcJc /, standing for Backward Renormalization Variational 
Bayes and k indicates the initial coarsest scale, k' the finest scale. The final lattice is obtained 
by five steps of renormalization. The full method is BRVB 04 . The original Variational Bayes 
is VB, and since it runs at the finest scale only, and is the same as BRVB 44 . We have also 
compared its performance to that of the MNE [ |11|| . 

We first present results obtained by solving the inverse problem for artificial data gen¬ 
erated by the forward problem, obtained by simulating two dipoles in the original lattice 
generated by Free Surfer. Manipulations include (i) the positions and magnitudes of the 
dipoles and (ii) noise corruption of the data. To quantify the quality of the inference we con¬ 


sidered the distance between real and localized dipoles (defined in section 3.2) estimated by 
averaging over ph 200 runs. 

Since studying the effect of adding noise to cases where BRVB 44 fails is not informative, 
we considered five different configurations where both algorithms performed equally well. 
Then, noise level was increased to study the decay of performance of the different methods. 

We end by showing the results of applying the method to publicly available EEG data in a 
face recognition task (http://www.fil.ion.ucl.ac.uk/spm/data/ SPM - Wellcome Trust Centre 
for Neuroimaging). 
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3.1 Simulations 


We start by considering two main active dipoles in the original lattice at random positions 
r(l) and r(2), separated by a distance L ( 55 < L < 65 mm) from each other. 

Each simulation starts at t = 0 runs for T = 51 samples, with each dipole intensity 
proportional to siny. Results shown in the images comes from analysis made from data 
collected at the peak value t = 26. 

The forward problem was solved using SPM (http://www.fil.ion.ucl.ac.uk/spm/ - Well¬ 
come Trust Centre for Neuroimaging) and Fieldtrip (http://fieldtrip.fcdonders.nl/ - Centre 
for Cognitive Neuroimaging of the Donders Institute for Brain, Cognition and Behaviour) 
MATLAB toolboxes. They permit calculating the Lead Field using 3-spheres aproximation 
(for simulations) and realistic head model BEM solutions (for analyses of real data). The 
Lead Field using 3-spheres was used to generate the potential in the head surface and the 
signal was then corrupted with NSR of 0.1. We used 128 electrodes positions and MRI tem¬ 
plate image available with the SPM toolbox. The inversion methods (MNE, BRVB 04 through 
BRVB 44 ) were implemented in MATLAB. 

As described in section 2.2[ each one of the 5 lattices A 0 ,..., A 4 has 40, 160, 640, 2560, 
10240 faces, respectively. The dipoles are positioned using then mean of vertices and faces of 
the original lattice generated by Free Surfer. In the spherical surface, it is possible to localize 
the vertices and faces above each divided face (or the faces of the icosahedron for A 0 ). These 
original faces and vertices have their position in the original folded surface. Each dipole was 
located in the mean of those vertices and oriented as the mean of the normals of those faces. 
This could introduce localization bias since the density of faces in the original lattice is not 
homogeneous. The same set of lattices was generated using balanced representations, but 
the performance in all algorithms did not change (results not reported for brevity), so this 
option was discarded. For more information see Ill2ll . 

Figure [3] shows as an example a particular run. In the first line, first panel in the left, the 
real dipole sources used to generate the data V. The following panels shows the estimated 
sources using Minimum Norm or MNE, BRVB 04 and BRVB 44 , respectively. Figure [4] shows the 
variance or inverse precision 1 /a n initial value in the two first columns and after convergence 
in the two last columns, at each scale for BRVB 04 in the first 5 lines and in the single grid 
BRVB 44 in the last. 

This is a nice example to show because it exemplifies the case when the random choice 
of the location of the sources placed one on a deep position, which is known to present a 
problem since the Lead field is very small. This problem has been addressed by Il20!l using 
the precision matrix, but here this information is extracted from the data. While the BRVB 44 
found only one, the more superficial source, the BRVB 04 was able to find both sources. For the 
two source case, about 10% of the 100 runs analyzed showed this difference in the behavior 
for deep sources and the method was seen to be robust under this condition. 
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Figure 3: The top left box represents the real current densities used to simulate the potential, 
while the others represents the different localization methods. Top and bottom line of each 
box represents the bottom and right view of the inflated cortex facing right, respectively. The 
first column represents the partially inflated brain and the second zooms into the indicated 
region in the first column. Color codes the current intensity, negative facing inwards and 
positive outwards. The cortex is represented by a lattice of 2.8 x 10 4 triangles and the fifth 
order grid is made up by 10240 = 2 x 20 x 4 4 possible dipole locations. 
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Initial Variance 0 RV B 04 Variance after VB converges 



Figure 4: Evolution of 1 /a under iteration of the backward renormalization algorithm. First 
and third columns shows the bottom view of the cortex surface, while second and fourth 
columns shows the right hemisphere, both facing right. First and second columns shows 
initial variance in each region, while third and fourth columns shows variance after the VB 
algorithm converged in that specific grid, with a given initial value. First to fifth rows show 
the stages of BRVB algorithm in each grid with: first BRVB 00 (40 = 2 x 20 faces), second 
BRVB 01 (160) third BRVB 02 (640), fourth BRVB 03 (2560) and fifth BRVB 04 (10240) (2 x 20 x 
4 d faces). The last line is the variance in the VB algorithm starting with a hyperparameter a 
uniform in the fifth lattice, BRVB 44 . Notice how aggressive is the convergence in the variance 
when the VB method starts with a uniform prior in the last surface, as evidenced by the 
zoomed square. 
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3.2 Validation of the method 

The main reason for simulating this problem is that it allows for comparisons with the real 
sources, used to generate the data. We used mainly the distance between the strongest and 
second strongest localized dipoles. 

To begin we compare the distances in mm between real and localized dipoles in the 200 
simulations. As we can see in Figure [5j we first compare the strongest dipole found by 
the methods BRVB 44 and BRVB 04 and the strongest real dipole. We also compare it to the 
minimum between the distances of the first and second strongest localized ones, ignoring 
differences in amplitudes. It is interesting to see that even in this case there is a peak at 60 
mm, specially for the BRVB 44 . 

Finally we show the ordered errors for all simulations, from BRVB 44 to BRVB 04 , evidencing 
the improvement in the use of each additional localization. 
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Figure 5: First line shows the histograms for the distance (mm) between the strongest real 
source and that found by different algorithms: BRVB 04 and BRVB 44 . Second line is the same 
type of histogram but showing the smallest distance between the strongest real source and 
the two strongest localized dipoles. This way, since we used two sources for generating the 
field, we don’t care which one was localized as the strongest, only it’s position. The last line 
shows the same information as the second histogram, but ordering the errors in ascendant 
fashion. Obtained from inference made on 200 different forward problems, with random 
positions and distances between the two dipoles randomly chosen between 55 and 65 mm. 

We also analyzed how different algorithms fare under the addition of different levels of 
noise. 

We identified 5 simulations where both the BRVB 44 , BRVB 34 and BRVB 04 obtained similar 
good results from data corrupted at low noise-to-signal ratios (NSR=0.1). In order to com¬ 
pare runs with similar good results, we have to restrict to cases where the sources are located 
in more superficial regions, since the performance for sources in deep locations are quite dif¬ 
ferent with the single scale method not even identifying a result to be compared. We added 
uncorrelated gaussian noise of zero mean and variance cr 2 to each of the M components 
(electrodes) of the vector V of data voltages so that 

M cr 2 

NSR = —-—. (18) 

V'V 

Finally, in Figure [6j besides the error distance in localization, we also plotted the ROC 
space for the three algorithms, and the curves for the distance between the average false pos- 
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itives positions to the real dipoles. Notice how the the BRVB 04 has a clear superior resistance 
to the increased noise. 
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Average distance between false positives and the first current peak Average distance between false positives and the second current peak 
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1 - SPEC NSR 

Figure 6: Performance of the different algorithms. The horizontal axis represents the noise to 
signal ratio (apart from the ROC space). First line : average distance between false positives 
and the most intense real dipole on the left, and the second most intense on the right. Second 
line : ROC space on the left (the size of the markers represents the intensity of noise), and 
average distance between the second most intense true positive and the second most intense 
real dipole on the right. 

3.3 Real Data from EEG 

We applied the BRVB 04 and the BRVB 44 in real data available at 

http://www.fil.ion.ucl.ac.uk/spm/data/mmfaces/ (SPM - Wellcome Trust Centre for Neu¬ 
roimaging). The experiment consists of 128 electrodes set in a Face / No-Face stimulus. We 
preprocessed the data as specified in the tutorial for source reconstrucion in SPM, and com¬ 
puted the difference between the average of each condition. The resulting scalp potential 
was used for the source localization. The sources shown in Figure [7] were found by BRVB 04 , 
BRVB 44 and MNE. Although they are in general consistent with the literature for this ex¬ 
periment II10II . the location of the sources as well the intensity is clearly different. Further 
studies are necessary to understand the nature of this difference in this specific protocol. 

4 Conclusion 

The main contribution of this paper is to use a Maximum Entropy inferential chain of projec¬ 
tions to systematically transfer posterior information from one coarse scale to the prior of a 
finner scale. This approach is not restricted to M/EEG or fMRI imaging methods, but should 
apply to inference about the localization of sources in any spatially extended system and thus 
has a potentially wide scope of applications. We analyzed the behavior of the hierarchical 
Bayesian approach for solving the M/EEG inverse problem. This was introduced into this 
context by ||22[| who used one coarser grid to restrict the search in a finer scale, without 
propagating the converged variance to the next scale. As shown by Nummenmaa et al Hl6ll . 
the Variational Bayes approach is very sensitive to the initial values of the hyperparameters. 
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Figure 7: Bottom view of the cortex looking up (most significant locations are on the bottom 
in that instant). From left to right : BRVB 04 , BRVB 44 and MNE at approximately 170 ms. 


Our multiscale approach was inspired by the fMRI work of Ql]] to systematically construct 
prior distributions. An important difference is that the M/EEG data introduces a further com¬ 
plication by not being a scalar as in the fMRI case, leading us to defining a priori directions 
for the dipoles, or risk having to deal with a nonlinear problem with a too large number of 
degrees of freedom. This simplicity permitted the more sophisticated formulation based on 
the backward renormalization group for discrete variables done in Q4B Our method permits 
setting the initial value for the hyperparameters in the VB approach on a finer scale from the 
value obtained after VB convergence in the coarser previous scale. The method being robust 
to choices in the coarsest scale. However the renormalization group anlysis of this harder 
problems remains incomplete and further work in this direction will follow. 

Second, we have done extensive simulations to validate the method. We presented results 
in simulated data that suggests that this approach is a valid aid to increase the precision of 
the localized dipoles and also to increase the performance in the presence of noise. The 
backward renormalization priors permitted a more systematic localization of deep sources 
far from the skull. 

Third, as mentioned by Nummenmaa et al Hl6ll in the discussion, there appears to be a 
natural trade-off between choosing a method providing smoother but unique solution, and 
the hierarchical approach with better spatial resolution and a multitude of candidate solu¬ 
tions. We claim that our method might represent a good direction in finding a compromise 
between both solutions. 

We stress that the main advantages of a Maximum Entropy or Bayesian approach is the 
clear identification of the often hidden assumptions underlying an algorithmic approach. 
This permits concentrating on the different pieces needed to solve the puzzle. It illustrates 
the fact that prior information not only goes into the prior but can improve the likelihood. 
While we need a better electromagnetic model of the brain as well as understanding the noise 
processes, here we just looked at the prior, but this can certainly be improved. A natural idea 
is to improve our method with anatomical prior information as used by ||19ll . 
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